Optimizing Data Scheduling on Processor-in-Memory Arrays
نویسندگان
چکیده
In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 10 floating point operations per second computing performance. However, one of the major obstacles to achieve the fast computing was interprocessor communications, which lengthen the total execution time of an application. A good data scheduling, consisting of finding initial data placement and data movement during the run-time, can give a significant reduction in the total communication cost and the execution time of the application. In this paper, we propose efficient algorithms for the data scheduling problem. Experimental results show the effectiveness of the proposed approaches. Compared with default data distribution methods such as row-wise or column-wise distributions, the average improvement for the tested benchmarks can be up to 30%.
منابع مشابه
Towards Truly Boolean Arrays in Data-Parallel Array Processing
Booleans are the most basic values in computing. Machines, however, store Booleans in larger compounds such as bytes or integers due to limitations in addressing memory locations. For individual values the relative waste of memory capacity is huge, but the absolute waste is negligible. The latter radically changes if large numbers of Boolean values are processed in (multidimensional) arrays. Mo...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملInterface Synthesis using Memory Mapping for an FPGA Platform
Several system-on-chip (SoC) platforms have recently emerged that use reconfigurable logic (FPGAs) as a programmable co-processor to reduce the computational load on the main processor core. We present an interface synthesis approach that enables us to do hardware-software codesign for such FPGA-based platforms. The approach is based on a novel memory mapping algorithm that maps data used by bo...
متن کاملA new approach to model communication for mapping and scheduling DSP-applications
We present a novel approach to model inter-processor communication in multi-DSP systems. In most multi-DSP systems, inter-processor communication is realized by transferring data over point-to-point links with hardware FIFO bu ers. Direct memory access (DMA) is additionally used to concurrently transfer data to the FIFO bu ers and perform computation. Our model accounts for the limited size of ...
متن کامل